Global Climate Change Analysis

The dataset for this project originates from the Climate Change: Earth Surface Temperature Data

Background:

Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.

Problem Statement:

In this problem we have to perform in-depth analysis to study the change of climate across all many years. Also we have to build a model capable enough to forecast temperature of india.

Dataset Information

The dataset originates from the Berkeley Earth Surface Temperature Study. It combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.

In this dataset, we have several files:

  1. Global Land and Ocean-and-Land Temperatures - GlobalTemperatures.csv

  2. Global Average Land Temperature by Country - GlobalLandTemperaturesByCountry.csv

  3. Global Average Land Temperature by State - GlobalLandTemperaturesByState.csv

  4. Global Land Temperatures By Major City - GlobalLandTemperaturesByMajorCity.csv

  5. Global Land Temperatures By City - GlobalLandTemperaturesByCity.csv

Table of Contents

  1. Environment Setup
  2. Load datasets
  3. Data Cleaning
  4. Feature Engineering
  5. Climate Change Analysis
  6. Climate Change in India
  7. Time Series Modeling
  8. Conclusion

1. Environment Setup

goto toc

1.1. Install Packages

Install required packages

goto toc

1.2. Load Dependencies

Import required packages

goto toc

2. Load datasets

We will loading all the datasets required using pd.read_csv() function.

goto toc

Read Global Land and Ocean-and-Land Temperatures from GlobalTemperatures.csv file

Read Global Average Land Temperature by Country from GlobalLandTemperaturesByCountry.csv file

Read Global Land Temperatures By Major City from GlobalLandTemperaturesByMajorCity.csv file

Note:

There are many missing values as well as noise in the datasets read so we need to clean them.

3. Data Cleaning

Data cleaning refers to preparing data for analysis by removing or modifying data that is incomplete, irrelevant, duplicated, or improperly formatted.

...goto toc

3.1. Missing Data Treatment

Since we are dealing with time series data, we will drop the columns with maximum missing data and only drop rows of features representing temperature of land

...goto toc

Handling missing data in Global Land and Ocean-and-Land Temperatures dataset

Missing data in Global Average Land Temperature by Country dataset

Missing data in Global Land Temperatures By Major City dataset

3.2. Handling Noise

In all the dataset the dt which is a date feature is represented as type object so we need to convert them in DateTime. To achive this we will be using pd.to_datetime() function of pandas. ...goto toc

4. Feature Engineering

In our datasets we will be creating new features like year and month from existing feature dt. This process of creating new features from existing feature is called Feature Engineeiring.

...goto toc

5. Climate Change Analysis

...goto toc

We will be analyzing climate change as:

5.1. Global Land and Ocean Temperature Change

...goto toc

Land Average Temperature from 1750 to 2010

To analyze temperature change over year we will be grouping the dataset with respect to year

From the charts you can see :

Average temperature in each season

To analyse temperature change in each season we will create a new feature called season using month feature.

Note: - We can clearly see that there is increase in temperature over the period of time. Even winters are getting hotter.

5.2. Global Land Temperature by Country

...goto toc

Note:

  1. Since some Continents included as countries we will discard them
  2. There are some countries which are represented as colonies as well as countries like Denmark. It is present as Denmark as well as Denmark (Europe)

Mapping of average temperatures in the countries

Sort the countries by the average temperature and plot them

Note: We can clear see that Greenland is the coolest and Djibouti is the hottest country on the planet with average temperature of -18.587458ºC and 28.816603ºC

Countries with the highest temperature differences

Now let's look at the top 15 countries with highest temperature differences. Temperature difference is the difference between the maximum and minimum temperature value.

Note: As we can see that Kazakhstan has the highest temperature difference between lowest and highest temperature recorded. With Minimum Temperature of -23.601ºC degree celcius and 25.562ºC degree celcius.

5.3. Global Land Temperature by Major City

...goto toc

List of the ten hottest cities in 1980

List of the ten hottest cities in 2010

As we can see, in those forty years there are some differences from 1980 to 2010:

  1. Madras and Jiddah switched places.
  2. Mogadishu moved down two ranks, from 6th place to 8th.
  3. Ho Chi Minh City moved up 2 ranks, from 7th place to 5th.
  4. Rangoon moved up 2 ranks, from 8th place to 6th.
  5. Fortaleza moved up 2 ranks from 9th place to 7th.
  6. Cities Surabaya and Hyderabad are no longer in the top ten, replaced by Ahmadabad and Lagos.

6. Climate Change in India

...goto toc

In India the meteorological department follows the international standard of four seasons with some local adjustments:

Note:

Analyzing Major Indian Cities

We can see that in general there is drastic change in temperature for the decades in major cities of india

Indian cities with the highest temperature differences

7. Time Series Modeling

...goto toc

There are several things that are time dependent, I mean, today's values can have an effective relationship to values that have occurred in the past.

Some examples related to the subject are demand of products during a certain period, harvest of commodities, stock prices and of course what we will try to predict, the climate change of Bombay.

Currently there are several types of time series forecast models, in this notebook we will trying to use ARIMA and Seasonal ARIMA models.

Get the data ready

7.1. Visualize the Time Series

...goto toc

let's plot the series and check its behavior

Check for seasonality

Series seems to have some seasonality. Just to make the things clear, let's merge these lines into just one line by averaging the monthly levels.

Important Inferences

The series clearly has some seasonality,

Check for Trend

Important Inferences

There is a constant increasing trend and the average temperature increased from 26.35º to 27.2º, that's 3.1% in over 111 years.

7.2. Stationarize the Series

...goto toc

To create a time series forecast, the series must be stationary.

Conditions for Stationarity:

  1. Time series should have a constant mean.
  2. Time series should have a constant standard deviation.
  3. Time series’s auto-covariance should not depend on time.

Check for Stationarity

One way to check if the series is stationary is perform the Adfuller test.In adfuller test we use ACF and PACF.

After performing Adfuller test if p-value is

Let's create a function which check the stationarity and plots:

Important Inferences

The series has an interesting behavior, there is a sequential significative negative autocorrelation starting at lag 8 and repeating each 12 months, it's because of the difference in the seasons, if today is winter with cold temperatures in 6 months we will have higher temperatures in the summer, that's why the negative autocorrelation occurs. These temperatures usually walk in opposite directions.

Also, from lag 12 and sequentially from every 12 lags there is a significant positive autocorrelation. The PACF shows a positive spike in the first lag and a drop to negative PACF in the following lags.

Important Inferences

Initially i'm going to work with the following (p,d,q) orders: (3, 0, 0), and with the following seasonal (P, D, Q, S) orders (0,1,1,12) and as the series has a clear uptrend i'm going to use it in the model.

7.3. Find Optimal Parameters

...goto toc

We will use grid search to find the optimal set of parameters that yields the best performance for our model

Important Inference

We have got a best AIC score of 2132.43 with parameters (1,1,1) and seasonal parameters (0,1,1,12)

7.4. Build SARIMA Model

...goto toc

The model diagnostics indicates that the model residuals are near normally distributed

The predicated values align well with the true values. The above plot indicates the observed value and the rolling forecast predications (A rolling forecast is an add/drop process for predicting the future over a set period of time).

Calculating MSE and RMSE

7.5. Make Predictions

...goto toc

We will predict the temperature of Bombay for next 13 months i.e till 2014-01-31

Save Forecasting

Save the SARIMA Model

...goto toc


Conclusion


During my research it was found that there has been a global increase trend in temperature, particularly over the last 30 years. This is due to the violent activities of a humankind. In more developed countries the temperature began to register much earlier. Over time the accuracy of the observations is increased, that is quite natural. Mankind must reflect and take all necessary remedies to reduce emissions of greenhouse gases in the atmosphere.

Additionaly, I have build a Seasonal-ARIMA model to forecast temperature of Bomaby city.

Model MSE RMSE
Seasonal-ARIMA 0.24 0.49

The built model is than used to predict the temperature of bombay for year 2013.

According to the forecasting Bombay will record a highest temperature of 28.55ºC in the month of April i.e during summers. Additionaly, monsoon is going to be cooler and there will increase in temperature in post-monsoon period. The temperature in winter's will remain same i.e 25ºC.